Statistical Sandhi Splitter and its Effect on NLP Applications
نویسندگان
چکیده
This paper revisits the work of (Kuncham et al., 2015) which developed a statistical sandhi splitter (SSS) for agglutinative languages that was tested for Telugu and Malayalam languages. Handling compound words is a major challenge for Natural Language Processing (NLP) applications for agglutinative languages. Hence, in this paper we concentrate on testing the effect of SSS on the NLP applications like Machine Translation, Dialogue System and Anaphora Resolution and show that the accuracy of these applications is consistently improved by using SSS. We shall also discuss in detail the performance of SSS on these applications.
منابع مشابه
Statistical Sandhi Splitter for Agglutinative Languages
Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopte...
متن کاملSignificance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages
This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word iden...
متن کاملA Sandhi Splitter for Malayalam
Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks ...
متن کاملExternal Sandhi and its Relevance to Syntactic Treebanking
External sandhi is a linguistic phenomenon which refers to a set of sound changes that occur at word boundaries. These changes are similar to phonological processes such as assimilation and fusion when they apply at the level of prosody, such as in connected speech. External sandhi formation can be orthographically reflected in some languages. External sandhi formation in such languages, causes...
متن کاملThe Effect of Square Splittered and Unsplittered Rods in Flat Plate Heat Transfer Enhancement
A square splittered and unsplittered rod is placed in a turbulent boundary layer developed over a flat plate. The effect of the resulting disturbances on the local heat transfer coefficient is then studied. In both cases the square rod modifies the flow structure inside the boundary layer. As a result, a stagnation point, a jet and wake area are generated around the square rod, each making a co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015